AITopics | quantile transformation

Collaborating Authors

quantile transformation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GEM-T: Generative Tabular Data via Fitting Moments

Li, Miao, Nguyen, Phuc, Tam, Christopher, Morgan, Alexandra, Ge, Kenneth, Bansal, Rahul, Yu, Linzi, Arnaout, Rima, Arnaout, Ramy

arXiv.org Machine LearningSep-23-2025

Tabular data dominates data science but poses challenges for generative models, especially when the data is limited or sensitive. We present a novel approach to generating synthetic tabular data based on the principle of maximum entropy -- MaxEnt -- called GEM-T, for ``generative entropy maximization for tables.'' GEM-T directly captures nth-order interactions -- pairwise, third-order, etc. -- among columns of training data. In extensive testing, GEM-T matches or exceeds deep neural network approaches previously regarded as state-of-the-art in 23 of 34 publicly available datasets representing diverse subject domains (68\%). Notably, GEM-T involves orders-of-magnitude fewer trainable parameters, demonstrating that much of the information in real-world data resides in low-dimensional, potentially human-interpretable correlations, provided that the input data is appropriately transformed first. Furthermore, MaxEnt better handles heterogeneous data types (continuous vs. discrete vs. categorical), lack of local structure, and other features of tabular data. GEM-T represents a promising direction for light-weight high-performance generative models for structured data.

dataset, synthetic data, training data, (15 more...)

arXiv.org Machine Learning

2509.17752

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
North America > United States > Wisconsin (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.94)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Deep Learning for GWP Prediction: A Framework Using PCA, Quantile Transformation, and Ensemble Modeling

Rajapriya, Navin, Kawajiri, Kotaro

arXiv.org Artificial IntelligenceNov-28-2024

Developing environmentally sustainable refrigerants is critical for mitigating the impact of anthropogenic greenhouse gases on global warming. This study presents a predictive modeling framework to estimate the 100-year global warming potential (GWP 100) of single-component refrigerants using a fully connected neural network implemented on the Multi-Sigma platform. Molecular descriptors from RDKit, Mordred, and alvaDesc were utilized to capture various chemical features. The RDKit-based model achieved the best performance, with a Root Mean Square Error (RMSE) of 481.9 and an R2 score of 0.918, demonstrating superior predictive accuracy and generalizability. Dimensionality reduction through Principal Component Analysis (PCA) and quantile transformation were applied to address the high-dimensional and skewed nature of the dataset,enhancing model stability and performance. Factor analysis identified vital molecular features, including molecular weight, lipophilicity, and functional groups, such as nitriles and allylic oxides, as significant contributors to GWP values. These insights provide actionable guidance for designing environmentally sustainable refrigerants. Integrating RDKit descriptors with Multi-Sigma's framework, which includes PCA, quantile transformation, and neural networks, provides a scalable solution for the rapid virtual screening of low-GWP refrigerants. This approach can potentially accelerate the identification of eco-friendly alternatives, directly contributing to climate mitigation by enabling the design of next-generation refrigerants aligned with global sustainability objectives.

artificial intelligence, descriptor, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.19124

Genre: Research Report (0.64)

Industry:

Energy > Oil & Gas (0.55)
Materials > Chemicals (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Integrating Marketing Channels into Quantile Transformation and Bayesian Optimization of Ensemble Kernels for Sales Prediction with Gaussian Process Models

Mirshekari, Shahin, Motedayen, Negin Hayeri, Ensaf, Mohammad

arXiv.org Artificial IntelligenceJun-10-2024

This study introduces an innovative Gaussian Process (GP) model utilizing an ensemble kernel that integrates Radial Basis Function (RBF), Rational Quadratic, and Mat\'ern kernels for product sales forecasting. By applying Bayesian optimization, we efficiently find the optimal weights for each kernel, enhancing the model's ability to handle complex sales data patterns. Our approach significantly outperforms traditional GP models, achieving a notable 98\% accuracy and superior performance across key metrics including Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Coefficient of Determination ($R^2$). This advancement underscores the effectiveness of ensemble kernels and Bayesian optimization in improving predictive accuracy, offering profound implications for machine learning applications in sales forecasting.

ensemble kernel, kernel, quantile transformation, (7 more...)

arXiv.org Artificial Intelligence

2404.09386

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > District of Columbia > Washington (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Marketing (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.47)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Balancing central and marginal rejection when combining independent significance tests

Salahub, Chris, Oldford, Wayne

arXiv.org Artificial IntelligenceNov-13-2023

A common approach to evaluating the significance of a collection of $p$-values combines them with a pooling function, in particular when the original data are not available. These pooled $p$-values convert a sample of $p$-values into a single number which behaves like a univariate $p$-value. To clarify discussion of these functions, a telescoping series of alternative hypotheses are introduced that communicate the strength and prevalence of non-null evidence in the $p$-values before general pooling formulae are discussed. A pattern noticed in the UMP pooled $p$-value for a particular alternative motivates the definition and discussion of central and marginal rejection levels at $\alpha$. It is proven that central rejection is always greater than or equal to marginal rejection, motivating a quotient to measure the balance between the two for pooled $p$-values. A combining function based on the $\chi^2_{\kappa}$ quantile transformation is proposed to control this quotient and shown to be robust to mis-specified parameters relative to the UMP. Different powers for different parameter settings motivate a map of plausible alternatives based on where this pooled $p$-value is minimized.

centrality quotient, marginal rejection level, rejection level, (14 more...)

arXiv.org Artificial Intelligence

2310.166

Country: North America > United States (0.14)

Genre: Research Report > Experimental Study (0.91)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

The Kernel Density Integral Transformation

McCarter, Calvin

arXiv.org Machine LearningOct-19-2023

Feature preprocessing continues to play a critical role when applying machine learning and statistical methods to tabular data. In this paper, we propose the use of the kernel density integral transformation as a feature preprocessing step. Our approach subsumes the two leading feature preprocessing methods as limiting cases: linear min-max scaling and quantile transformation. We demonstrate that, without hyperparameter tuning, the kernel density integral transformation can be used as a simple drop-in replacement for either method, offering protection from the weaknesses of each. Alternatively, with tuning of a single continuous hyperparameter, we frequently outperform both of these methods. Finally, we show that the kernel density transformation can be profitably applied to statistical data analysis, particularly in correlation analysis and univariate clustering.

artificial intelligence, machine learning, transformation, (16 more...)

arXiv.org Machine Learning

2309.10194

Country:

North America > United States > California (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
(2 more...)

Add feedback

Electricity Theft Detection with self-attention

Finardi, Paulo, Campiotti, Israel, Plensack, Gustavo, de Souza, Rafael Derradi, Nogueira, Rodrigo, Pinheiro, Gustavo, Lotufo, Roberto

arXiv.org Machine LearningFeb-14-2020

In this work we propose a novel self-attention mechanism model to address electricity theft detection on an imbalanced realistic dataset that presents a daily electricity consumption provided by State Grid Corporation of China. Our key contribution is the introduction of a multi-head self-attention mechanism concatenated with dilated convolutions and unified by a convolution of kernel size $1$. Moreover, we introduce a binary input channel (Binary Mask) to identify the position of the missing values, allowing the network to learn how to deal with these values. Our model achieves an AUC of $0.926$ which is an improvement in more than $17\%$ with respect to previous baseline work. The code is available on GitHub at https://github.com/neuralmind-ai/electricity-theft-detection-with-self-attention.

dataset, electricity theft detection, transformation, (13 more...)

arXiv.org Machine Learning

2002.06219

Country:

Asia > China (0.24)
Europe > Ireland (0.14)
South America > Brazil > São Paulo (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Splitting matters: how monotone transformation of predictor variables may improve the predictions of decision tree models

Galili, Tal, Meilijson, Isaac

arXiv.org Machine LearningNov-14-2016

It is widely believed that the prediction accuracy of decision tree models is invariant under any strictly monotone transformation of the individual predictor variables. However, this statement may be false when predicting new observations with values that were not seen in the training-set and are close to the location of the split point of a tree rule. The sensitivity of the prediction error to the split point interpolation is high when the split point of the tree is estimated based on very few observations, reaching 9% misclassification error when only 10 observations are used for constructing a split, and shrinking to 1% when relying on 100 observations. This study compares the performance of alternative methods for split point interpolation and concludes that the best choice is taking the mid-point between the two closest points to the split point of the tree. Furthermore, if the (continuous) distribution of the predictor variable is known, then using its probability integral for transforming the variable ("quantile transformation") will reduce the model's interpolation error by up to about a half on average. Accordingly, this study provides guidelines for both developers and users of decision tree models (including bagging and random forest).

artificial intelligence, estimator, machine learning, (14 more...)

arXiv.org Machine Learning

1611.04561

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report > Experimental Study (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback